Automating RDF Dataset Transformation and Enrichment

نویسندگان

  • Mohamed Ahmed Sherif
  • Axel-Cyrille Ngonga Ngomo
  • Jens Lehmann
چکیده

With the adoption of RDF across several domains, come growing requirements pertaining to the completeness and quality of RDF datasets. Currently, this problem is most commonly addressed by manually devising means of enriching an input dataset. The few tools that aim at supporting this endeavour usually focus on supporting the manual definition of enrichment pipelines. In this paper, we present a supervised learning approach based on a refinement operator for enriching RDF datasets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against eight manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Full Syntactic Parsing for Enrichment of RDF dataset

RDF data extracted automatically often contain long textual literals. This paper shows how to use natural language processing techniques to automatically generate specific RDF triples from the information in the literals. We look specifically at drug indications found in the DailyMed dataset. We develop knowledge schemas to capture its information as well as precise syntactic-based methods of k...

متن کامل

Testing OWL Axioms against RDF Facts: A Possibilistic Approach

Automatic knowledge base enrichment methods rely critically on candidate axiom scoring. The most popular scoring heuristics proposed in the literature are based on statistical inference. We argue that such a probability-based framework is not always completely satisfactory and propose a novel, alternative scoring heuristics expressed in terms of possibility theory, whereby a candidate axiom rec...

متن کامل

Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data

BACKGROUND The World Wide Web has become a dissemination platform for scientific and non-scientific publications. However, most of the information remains locked up in discrete documents that are not always interconnected or machine-readable. The connectivity tissue provided by RDF technology has not yet been widely used to support the generation of self-describing, machine-readable documents. ...

متن کامل

datos.bne.es: a Library Linked Data Dataset

We describe the datos.bne.es library dataset, which makes available the authority and bibliography catalogue from the Biblioteca Nacional de España (BNE, Spanish National Library) as Linked Data. The catalogue contains around 7 million authority and bibliographic records. The records in MARC 21 format were transformed to RDF and modelled using IFLA ontologies. A tool named MARiMBA automatize th...

متن کامل

Towards Sustainable Extract-Transform-Load Fusion of Company Data

Openly available datasets originate from different data providers which range from government agencies, over commercial enterprises to communities of data enthusiasts. Integrating different source datasets into a single RDF graph by using ETL (Extract-Transform-Load) systems which perform offline transformation, ontology matching and linking techniques usually takes many iterations of revisions...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015